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Many of life's most fascinating phenomena emerge from interactions among many elements — many 
amino acids determine the structure of a single protein, many genes determine the fate of a cell, 
many neurons are involved in shaping our thoughts and memories. Physicists have long hoped that 
these collective behaviors could be described using the ideas and methods of statistical mechanics. 
In the past few years, new, larger scale experiments have made it possible to construct statistical 
mechanics models of biological systems directly from real data. We review the surprising successes 
of this "inverse" approach, using examples form families of proteins, networks of neurons, and flocks 
of birds. Remarkably, in all these cases the models that emerge from the data are poised at a very 
special point in their parameter space — a critical point. This suggests there may be some deeper 
theoretical principle behind the behavior of these diverse systems. 



I. INTRODUCTION 

One of the great triumphs of twentieth century science 
was the identification of the molecular building blocks 
of life. From the DNA molecules whose sequence and 
structure control the flow of genetic information, to the 
ion channels and receptors whose dynamics govern the 
flow of information in the brain, these building blocks 
are, to a remarkable extent, universal, shared among all 
forms of life on earth. Despite the importance of this 
reduction to elementary constituents, most of what we 
recognize as the phenomena of life are not properties of 
single molecules, but rather emerge from the interactions 
among many molecules. Almost by definition, what we 
find especially interesting about the behavior of multicel- 
lular organisms (like us) emerges from interactions among 
many cells, and the most striking behaviors of animal 
(and human) populations are similarly collective. 

For decades, physicists have hoped that the emergent, 
collective phenomena of life could be captured using ideas 
from statistical mechanics. The stationary states of bio- 
logical systems have a subtle structure, neither "frozen" 
into a well ordered crystal, nor chaotic and disordered 
like a gas. Further, these states are far from equilibrium, 
maintained by a constant flow of energy and material 
through the system. There is something special about 
the states corresponding to functional, living systems, 
but at the same time it cannot be that function depends 
on a fine tuning of parameters. Of the many ideas rooted 
in statistical physics that have been suggested to charac- 
terize these states, perhaps the most intriguing — and the 
most speculative — is the idea of self-organized criticality. 

The theory of self-organized criticiality has its origin 
in models for inanimate matter (sandpiles, earthquakes, 
etc.) pj, but the theory was then extended and adapted 
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to encompass biological systems through the analysis of 
simple toy models [2]. As an example, simple models 
for the evolution of interacting species can self-organize 
to a critical state in which periods of quiescence are in- 
terrupted by "avalanches" of all sizes 2] i which reminds 
us of the idea of punctuated equilibria in evolution [4]. 
Similarly, it was suggested that the brain is in a self- 
organized critical state, at the boundary between being 
nearly dead and being fully epileptic [5,. It now seems 
unlikely that some of the initial ideas were correct (e.g., 
real sand behaves very differently from the models), but 
possibility that biological system poise themselves at or 
near a critical point remains tantalizing. 

Despite the enthusiasm for using ideas from statistical 
physics to think about biological systems, the connec- 
tions between the models and the experimentally measur- 
able quantities often has been tenuous. Even in the case 
of neural networks, where statistical physics approaches 
are perhaps best developed [SHn], the relationship be- 
tween the models and the dynamics of real neurons is 
somewhat loose. For the ideas of criticality, it might not 
be too harsh to suggest that much of what has been done 
is at the level of metaphor, rather than calculations which 
could be tested against real data. 

In the past decade or so, there has been an important 
development in the experimental investigation of biolog- 
ical networks, and this suggests a very different route to 
the use of ideas from statistical physics. While it has 
long been conventional to monitor the activity or state 
of individual elements in a network, it is now possible 
to monitor many elements in parallel. The technologies 
are specific to each class of systems — large arrays of elec- 
trodes recording from many neurons in parallel |10l lllj , 
high throughput sequencing to probe large ensembles of 
amino acid sequences |12| . accurate imaging to track in- 
dividual animals in large groups |13H17j — and each mea- 
surement of course has its own limitations. Nonetheless, 
the availability of these new experiments has led sev- 
eral groups to try constructing statistical physics models 
directly from the data. A remarkable feature of these 
analyses, scattered across many levels of organization, 
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is the appearance of signatures of criticality. Whereas 
twenty-five years ago we had a grand theory with httle 
connection to data, we now have many isolated discus- 
sions of particular experiments hinting at similar conclu- 
sions. Our goal here is to bring these analyses together, 
perhaps rekindling the hopes for a more general theory. 

II. ZIPF'S LAW AND CRITICALITY 

In the usual examples of critical phenomena, there are 
some natural macroscopic variables with a singular de- 
pendence on parameters that we can control experimen- 
tally. A familiar example is that we can identify the 
liquid/gas critical point by measuring the density of the 
fluid as a function of temperature and pressure. It is 
worth noting that, sometimes, doing experiments that 
couple to the correct macroscopic variables is difficult, as 
in the Bishop-Reppy experiments on superfluid helium 
films [18J. In many cases one can also identify criticality 
in purely thermodynamic measurements, as a singular- 
ity in the heat capacity as a function of temperature, or 
through the behavior of the correlation function of fluctu- 
ations in some local variable, such as the magnetization 
in a magnet. 

The difficulty in biological systems is that they are 
not really equilibrium statistical mechanics problems, so 
there is no guarantee that we can find relevant macro- 
scopic variables, and certainly it is not clear how to 
change the temperature. Even if an Ising spin glass is 
the correct description of a neural network, for example 
[TM^ , it is not clear how to measure the analog of the 
magnetic susceptibility. Nonetheless it may be true that 
the probability of finding the system in a particular state 
is governed by a probability distribution that is mathe- 
matically equivalent to the Boltzmann distribution for a 
system poised at a critical point. 

Let us denote by u the state of a system. Typically, cr 
is a multi-dimensional variable cr — (cti, . . . , (Jat), where 
<Ji can be a spin, a letter in a word, the spiking activity 
of a neuron, an amino acid in a peptide chain, or the 
vector velocity of bird in a fiock. Let us then denote by 
P(<t) the probability of finding the system in the state cr. 
One can formally write this probability as a Boltzmann 
distribution: 

P{a) = le-^(-)/'^-^, (1) 

where kB is Boltzmann's constant and Z the partition 
function. Without loss of generality we can set the tem- 
perature to fc^T = 1 and Z to 1, which leads to the 
following definition for the energy: 

E{rT)^~\o^P{a). (2) 

With the availability of large datasets of biological sys- 
tems, it now seems possible to construct P(cr) directly 
from the data, and to take the corresponding energy func- 
tion -B(cr) seriously as a statistical mechanics problem. In 



this section we explore the consequences of that idea, by 
showing the equivalence between Zipf 's law of language 
and the critical properties of the associated statistical 
mechanics model. 

In our modern understanding of critical phenomena in 
equilibrium systems, a central role is played by power 
law dependencies. Indeed, the exponents of these power 
laws — describing the dependence of correlations on dis- 
tance, or the divergence of thermodynamic quantities as 
a function of temperature — are universal, and reflect fun- 
damental features of the underlying field theory that de- 
scribes the long wavelength behavior of the system. Self- 
organized critical systems also exhibit power laws, for ex- 
ample in the distribution of sizes of the avalanches that 
occur as a sandpile relaxes pi . Power laws have also been 
observed empirically in a wide variety of non-equilibrium 
systems [32], although many of these claims do not sur- 
vive a rigorous assessment ^23 . It is also fair to note that, 
in contrast to the case of equilibrium critical phenomena, 
the observation of power laws in these more exotic cases 
has not led to anything like a general theory. 

There is a very old observation of a power law in a 
biological system, and this is Zipf's law in language [24) . 
first observed by Auerbach in 1913 [5S]. In contrast to 
examples such as avalanches, where power laws describe 
the dynamics of the system, Zipf's law really refers to 
the distribution over states of the system, in the same 
way that the Boltzmann distribution describes the dis- 
tribution over states of an equilibrium system. Specifi- 
cally, in written language we can think of the state of the 
system as being a single word cr, and as texts or con- 
versations proceed they sample many such states. If one 
orders (ranks) words cr by their decreasing frequency 
P(cr), Zipf's law states that the frequency of words Pier) 
decays as the inverse of their rank r(cr): 



This distribution cannot be normalized when the num- 
ber of words is infinite. This can be corrected either by 
introducing a cutoff corresponding to a finite vocabulary, 
or by slightly modifying the law to P = r~"/^(a), with 
a > 1 and (,{c») is Riemann's zeta function. Since its 
introduction in the context of language, Zipf's law has 
been observed in all branches of science, but has also at- 
tracted a lot of criticism, essentially for the same reasons 
as other power laws, but also because of the controversial 
claim by Zipf himself that his law was characteristic of 
human language. 

Despite all our concerns, Zipf's law is, in a certain 
precise sense, a signature of criticality |26]. To see this, 
consider the density of states, obtained just by counting 
the number of states in a small window (5£', The density 
of states is the number of states within a small energy 
bracket: 
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where I[x] is the indicator function. This density of states 
is the exponential of the entropy, and in the thermody- 
namic hmit the energy and the entropy both should scale 
with the system's size N: 

SiE) = log pse{E) = Ns(e = E/N) + si, (5) 

where si is sub-extensive, that is \impf^oo{si/N) — 0. 
The bin size SE only affects the sub-extensive corrections 
as 6E — >• 0, and can be ignored for very large systems. 
But for real data and finite N, the choice of the bin size 
SE can be problematic, and it is useful to consider instead 
the cumulative density of states: 



U{E) = Y,^[E{(t)<E]= dE'psE^E'). 



(6) 



For large systems, this integral is dominated by the max- 
imum of the integrand, and the two definitions for the 
density of states become equivalent: 



(7) 



E/N 



de' exp [N (s(e') + Si/N)] (8) 

J — OO 

^e^^('), (9) 



log7V(£;) - Ns{E/N) = S{E). 



(10) 



But the rank r(cr) is exactly the cumulative density of 
states at the energy of cr: 

r{cT) = N[E = E{ct)], 

that is, the number of states that are more frequent (or of 
lower energy) than cr, and so in general we expect that, 
for large systems, 

S[E{a)]^\ogria). (11) 
Zipf 's law tell us that probabilities are related to ranks. 
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-r-"(<T) 



- log P{cr) = a log r(cr) + log ((a). 



(12) 



But now we can connect probabilities to energy, from Eq 
([2]), and ranks to entropy, from Eq (11), to give 



S{E) = 



E 



(13) 



where again • • • is sub-extensive. In words, Zipf 's law for 
a very large system is equivalent to the statement that 
the entropy is an exactly linear function of the energy. 

A perfectly linear relation between entropy and energy 
is very unusual. To see why — and to make the connection 
to criticality — let's recall the relation of the (canonical) 
partition function to the energy /entropy relationship. As 
usual we have 



-E{(T)/kBT 



(14) 



where we have reintroduced a fictious temperature T. 
The "operating temperature," i.e. the temperature of 
the original distribution, is k^T — 1. Then we have 



Z{T) = J dEp{E)t 



-E/kBT 



(15) 



where p{E) is the density of states as before. But in the 
same large N approximations used above, we can write 



Z{T) ^ J dEp{E)e-'^'^^'^ 



j deexp[iV(s(e) - e/fcsT)] 



(16) 
(17) 



For large N , this integral is dominated by the largest 
term of the integrand, which is the point where ds/de = 
l/ksT] this much is standard, and true for all sys- 
tems. But in the special case of Zipf's law, we have 
ds/de = l/a, for all energies. What this really means 
is that ksT = a is a (very!) critical point: for any 
ksT < a, the system freezes into a ground state of zero 
energy and zero entropy, while for kgT > a the system 
explores higher energies with ever higher probabilities, 
and all thermodynamic quantities diverge if Zipf's law 
holds exactly. 

Clearly, not all critical systems are described by a den- 
sity of states as restrictive as in Eq (13 1. Systems ex- 



hibiting a first order transition have at least one energy 
E for which S"{E) < 0, and systems with a second order 
phase transition are characterized by the existence of an 
energy where S"{E) = 0. The specific heat, whose diver- 
gence serves to detect second order phase transitions, can 
be related to the second derivative of the micocanonical 
entropy: 



C{T) = 



N 
2^2 



d^S{E) 



dE^ 



(18) 



What is truly remarkable about Zipf's law, and its cor- 
relate Eq ( |13| , is that S"{E) — at all energies, mak- 
ing Zipf's law a very strong signature of criticality. A 
tangible consequence of this peculiar density of states is 
that the entropy is sub-extensive below the critical point, 
S/N 0. For real data, finite size effects will compli- 
cate this simple picture, but this argument suggests that 
critical behaviour can considerably reduce the space of 
explored states, as measured by the entropy. In later sec- 
tions, we will see examples of biological data which obey 
Zipf's law with surprising accuracy, and this observation 
will turn out to have practical biological consequences. 



III. MAXIMUM ENTROPY MODELS 

Systems with many degrees of freedom have a daunt- 
ingly large number of states, which grows exponentially 
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with the system's size, a phonemon sometimes called the 
'curse of dimensionality'. Because of that, getting a good 
estimate of P(<t) from data can be impractical. The prin- 
ciple of maximum entropy [571 HH] is a strategy for dealing 
with this problem by assuming a model that is as random 
as possible, but that agrees with some average observ- 
ables of the data. As we will see, maximum entropy mod- 
els naturally map onto known statistical physics models, 
which will ease the study of their critical properties. 

In the maximum entropy approach, the real (but un- 
known) distribution Pr{cr) is approximated by a model 
distribution P^{(t) that maximizes the entropy |29j : 



cr ) was produced by the model: 



5[p] = -^PHiogPH, 



and that satistifies 



(19) 



(20) 



where Oi, O2, ■ ■ ■ are observables of the system, and 
and (•),n are averages taken with and Pm respectively. 
The key point is that often average observables {Oa)i 
can be estimated accurately from the data, even when 
the whole distribution Pr(c) cannot. Oa is typically a 
moment of one or a few variables, but it can also be a 
global quantity of the system. Using the technique of 
Lagrange multipliers, one can write the explicit form of 
the model distribution: 



(21) 



/3i,/32,-.. are the Lagrange multiphers associated to 
the constraints (20) and constitute the fitting parame- 



ters of the model. When the maximum entropy model 
is constrained only by the mean value of the energy, 
0{<t) = —E{a), we recover the Boltzmann distribution, 
Pm(cr) = Z-^e-^^e^), where (3 = l/keT is the inverse 
temperature. More generally, the exponential form of 
the distribution (21) suggests to define the energy as: 



There exists a unique set of Lagrange multipliers that 
satisfies all the constraints, but finding them is a com- 
putationally difficult inverse problem. Inverse problems 
in statistical mechanics have a long history, which goes 
at least as far back as Keller and Zumino, who infered 
microscopic interaction potentials from thermodynamic 
quantities [3^ . The special case of binary variables con- 
strained by pairwise correlations was formulated in 1985 
by Ackley, Hinton, and Sejnowski in their discussion of 
"Boltzmann machines" as models for neural networks 
[31] . Solving the inverse problem is equivalent to mini- 
mizing the KuUback-Leibler divergence between the real 
and the model distribution (pTl), defined as: 



DKL{PA\Pr.) = E P^^) log #S' (22) 

or equivalently, to maximizing the log-likelihood C that 
the experimental data (given by M independent draws 



M 



/: = iogn^m(T") 

a=l 

= M^P,.(^)logP,,(<T) 
cr 

= M{S[P,]~DKL{PA\Pm)}■ 



{23) 



where, by definition, Pr{cr) ~ (lA-^) Sfli '^c,cr<'. In 
fact, one has: 



dDKLjPAlPm) 

dfia 



= {Oa) 



(24) 



which ensures that the constraints (20) are satisfied at 



the minimum. This explicit expression of the derivatives 
suggests to use a gradient descent algorithm, with the 
following update rules for the model parameters: 



/3a^/3a+?7((a)r-(Oa>m), 



(25) 



where 77 is a small constant, the "learning rate." Note 
that in this framework, the inverse problem is in fact 
broken down into two tasks: estimating the mean observ- 
ables (Oa)m within the model distribution for a given set 
of parameters Pa (direct problem); and implementing an 
update rule such as ( 25 ) that will converge to the right 
PaS (inverse problem). The direct problem is compu- 
tationally costly, as it requires to sum over all possible 
states (J. Approximate methods have been proposed to 
circumvent this difficulty. Monte Carlo algorithms have 
been commonly used [101 1311 132] and have been improved 
by techniques such as histrogram sampling [31] . Approx- 
imate analytic methods, such as high temperature ex- 
pansions [351 IM| or message-passing algorithms [371 13H] j 
were also developed, and shown to be fast and accurate 
in the perturbative regime of weak correlations. 

Note that even when a solution to the inverse problem 
can be found, one still needs to evaluate whether the max- 
imum entropy distribution correctly describes the data, 
for example by testing its predictions on local and global 
observables that were not constrained by the model. In 
the following two sections we present examples in which 
maximum entropy models were successfully fitted to real 
biological data, and analyzed to reveal their critical prop- 
erties. We then turn to other approaches that also point 
to the criticality of diff'erent biological systems. 



IV. NETWORKS OF NEURONS 

Throughout the nervous systems of almost all animals, 
neurons communicate with one another through discrete, 
stereotyped electrical pulses called action potentials or 
spikes [33. Thus, if we look in a brief window of time 
At, the activity of a neuron (denoted by i) is binary: in 
this brief window, a neuron either spikes, in which case 
we assign it 0"^ = 1, or it does not, and then ai = — 1. 
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In this notation the binary string or 'spike word' <7 = 
{a I, . . . ,<7n) entirely describes the spiking activity of a 
network of N neurons, and the probabihty distribution 
P{(t) over aU 2^ possible spiking states describes the 
correlation structure of the network, as well as defining 
the "vocabulary" that the network has at its disposal 
to use in representing sensations, thoughts, memories or 
actions. 

For large networks, sampling all 2^ words is of course 
impractical. For many years, much attention was focused 
on the behavior of single neurons, and then on pairs. 
An important observation is that correlations between 
any two neurons typically are weak, so that the corre- 
lation coefhcient between ai and (Jj^i is on the order of 
0.1 or less. It is tempting to conclude that, physicists' 
prejudices notwithstanding, neurons are approximately 
independent, and there are no interesting collective ef- 
fects. As soon as it became possible to record simultane- 
ously from many neurons, however, it became clear that 
this was wrong, and that, for example, larger groups of 
neurons spike simultaneously much more frequently than 
would be expected if spiking were independent in every 
cell [1^. It is not clear, however, how to interpret such 
data. It might be that there are specific sub-circuits 
in the network that link special groups of many cells, 
and it is these groups which dominate the patterns of 
simultaneous spiking. Alternatively, the network could 
be statistically homogenous, and simultaneous spiking of 
many cells could emerge as a collective effect. An im- 
portant hint is that while correlations are weak, they are 
widespread, so that any two neurons that plausibly are 
involved in the same task are equally likely to have a 
significant correlation. 

To make this discussion concrete, it is useful to think 
about the vertebrate retina. The retina is an ideal place 
in which to test ideas about correlated activity, because 
it is possible to make long and stable recordings of many 
retinal ganglion cells — the output cells of the retina, 
whose axons bundle together to form the optic nerve — 
as they respond to visual stimuli. In particular, because 
the retina is approximately flat, one can record from the 
output layer of cells by placing a piece of the retina on 
an array of electrodes that have been patterned onto to 
a glass slide, using conventional methods of microfabri- 
cation. Such experiments routinely allow measurements 
on ~ 100 neurons, in some cases sampling densely from 
a small region of the retina, so that this represents a sig- 
nificant fraction of all the cells in the area covered by the 
electrode array [lOullj . 

The average rate at which neuron i generates spikes 
is given by fi = ((1 -I- ai)/2) / At, so that knowing the 
average rates is the same as knowing the local magne- 
tizations ((Ti). The maximum entropy model consistent 
with these averages, but with no other constraints, is a 
model of independently firing cells, from Eq. (21): 



where hi is the Lagrange multiplier associated to the av- 
erage observable (ci). Although the independent model 
may correctly describe the activity of small groups of 
neurons, it is often inconsistent with some global prop- 
erties of the network. For example, for the retina stimu- 
lated by natural movies j,191, the distribution of the total 
number of spikes K = X^j^i(l -I- cTi)/2 is observed to be 
approximately exponential [P{K) « e~^/^], while an in- 
dependent model predicts Gaussian tails. This suggests 
that correlations strongly determine the global state of 
the network. 

As the first step beyond an independent model, one can 
look for the maximum entropy distribution that is consis- 
tent not only with {<7i), but also with pairwise correlation 
functions between neurons luia^). The distribution then 



takes a familiar form: 



N 



-E((T) 



E{ct) 



i<j 



(27) 

where Jij is the Lagrange multiplier associated to {uiUj). 
Remarkably, this model is mathematically equivalent to a 
disordered Ising model, where hi are external local fields, 
and Jij exchange couplings. Ising models were first in- 
troduced by Hopfield in the context of neural networks to 
describe associative memory j6j. The maximum entropy 
approach allows for a direct connection to experiments, 
since all the parameters hi and Jij are determined from 
data. 

Maximum entropy distributions consistent with pair- 
wise correlations, as in Eq (27), were fitted for subnet- 



works of up to = 15 neurons [11] by direct summation 
of the partition function coupled with gradient descent 
[Eq. (25)]. These models did a surprisingly good job of 



predicting the collective firing patterns across the pop- 
ulation of all N neurons, as illustrated in Fig. [l] Im- 
portantly, the model of independent neurons makes er- 
rors of many orders of magnitude in predicting relative 
frequencies of the iV— neuron patterns, despite the fact 
that pairwise correlations are weak, and these errors are 
largely corrected by the maximum entropy model. The 
accuracy of the model can be further evaluated by asking 
how much of the correlative structure is captured. The 
overall strength of correlations in the network is mea- 
sured by the multi-information [41], defined as the total 
reduction in entropy relative to the independent model, 
/ = S[Pi\ - S[P,]. The ratio: 



h ^ S[Pi] - S[P2] 
I 5[Pi] - S[P,] 



(28) 



(26) 



thus gives the fraction of the correlations captured by 
the model. When N is small enough (< 10), S[Pi] can 
be evaluated by directly estimating Prior) from data. In 
the salamander retina J2// « 90%, indicating excellent 
performance of the model. 

The generality of the maximum entropy approach sug- 
gests that its validity should extend beyond the special 
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FIG. 1. The Ising model greatly improves the prediction of 
retinal activity over the independent model [W. A. Neuronal 
activity is summarized by a binary word ct = oi, . . . ,gm ob- 
tained by binning spikes into 20 ms windows. B. The fre- 
quencies of all spike words cr of a subnetwork of A'' = 10 
neurons are compared between the experiment (x axis) and 
the prediction (y axis) of the independent model (gray dots) 
and the maximum entropy model with pairwise interactions 
(black dots). The straight line represents identity. 



case of the salamander retina, and much subsequent work 
has been devoted to testing it in other contexts. In a ef- 
fort parallel to [19| , the activity of the retina of macaque 
monkeys |42j was analyzed with maximum entropy meth- 
ods. The behaviour of small populations [N = 3 to 7) 
of ON and OFF parasol cells was acurately explained by 
an Ising model, with 98 to 99% of the correlations cap- 
tured. Mammalian retinal ganglion cells can be classified 
into well-defined types, and cells of a given type tile the 
visual space like a mosaic [43]; this stands in contrast to 
the salamander retina, where cells are not well typed and 
are grouped in large patches responding to the same area 
of the visual space. It was found that restricting interac- 
tions to adjacent pairs in the mosaic did not significantly 
alter the performance of the model, at least under a lim- 
ited set of stimulus conditions, a result later confirmed 
for larger networks [351. 

The maximum entropy framework was also extended 



to other (non retinal) areas of the brain. In cultured cor- 
tical neurons [HI |2T] and cortical slices [21] , Ising models 
perfomed as well as in the retina (88 to 95% of the correla- 
tion captured) . Ising models also proved useful for study- 
ing neural activity in the visual cortex of cats [H] and 
macaque monkeys [45l|46]. In monkeys, the Ising model 
agreed well with data when neurons were far apart from 
each other (> 600 /im, tens of micro-columns), but failed 
at shorter separations (< 300 /^m, a few micro-columns), 
where higher order correlations prevail |46| . This em- 
phasizes the importance of testing the model predictions 
systematically on local as well as global observables, and 
if necessary add constraints to the model. 

Most of the work reviewed so far was restricted to 
small population sizes, partly because of the difficulty 
of recording from many neurons simultaneously, but also 
because of the computational problems mentioned in the 
previous section. In the salamander retina [19 , extrap- 
olations from small networks [N < 15) have suggested 
that the constraints imposed by pairwise correlations 
considerably limit the space of possible patterns (mea- 
sured by the entropy) as N grows, effectively confining 
it to a few highly correlated states when N « 200 — 
roughly the size of a patch of retinal ganglion cells with 
overlapping receptive fields. This led to the proposal that 
the network might be poised near a critical point. 

To test that idea, an Ising model of the whole pop- 
ulation of ganglion cells recorded in [TH] (iV = 40) was 
fitted using Monte Carlo methods and gradient descent 
[2U1 ITT] . Although the large size of the population forbids 
to compute global information theoretic quantities such 
a I2 /I, the validity of the model can still be tested on lo- 
cal observables not fitted by the model. Specifically, the 
model was found to be a good predictor of the three-point 
correlation functions {uiCijCTk) measured in the data, as 
well as of the distribution of the total number of spikes 
across the population. 



Armed with an explicit model (27) for the whole net- 
work, one can explore its thermodynamics along the 
lines sketched in section [ill The introduction of a fic- 
ticious temperature T [as in Eq. ( [14[ )] corresponds to a 
global rescaling of the fitting parameters, hi — ^ hi/ksT, 
Jij —7- Jij/T. As seen in Fig. [2] the heat capacity ver- 
sus temperature is found to be more and more sharply 
peaked around the operating temperature fcsT — 1 as 
one increases the network size N. One can also use these 
"thermodynamic" measurements to show that the ob- 
served networks of < 40 cells are very similar to net- 
works that are generated by mean spike probabilities and 
correlations at random from the observed distributions of 
these quantities. This raises the possibility that critical- 
ity could be diagnosed directly from the distribution of 
pairwise correlations, rather than their precise arrange- 
ment across cells. More concretely, it gives us a path 
to simulate what we expect to see from larger networks, 
assuming that the cells that have been recorded from in 
this experiment are typical of the larger population of 
cells in the neighborhood. The result for TV = 120 is an 
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FIG. 2. Divergence of the heat capacity is a classical signature 
of criticality. This plot represents the heat capacity versus 
temperature for Ising models of retinal activity for increasing 
population sizes [47]. The 'W = 20, rand," and iV = 
120 curves were obtained by infering Ising models for fictions 
networks whose correlations were randomly drawn from real 
data. Error bars show the standard deviation when choosing 
different subsets of A*' neurons among the 40 available. 




FIG. 3. The activity of populations of retinal ganglion cells 
obeys Zipf's law (from the data in Ref _.19J. Shown is the 
probability of activity patterns (or 'words') against their rank 
for various population sizes. Error bars show the variabil- 
ity across different choices of subpopulations. Note that the 
agreement with Zipf's law, P(cr) oc 1/rank, is maximum for 
larger A'^. 



even clearer demonstration that the system is operating 
near a critical point in its parameter space, as shown by 
the huge enhancement of the peak in specific heat, shown 
in the top curve of Fig. [2] 

This diverging heat capacity is further evidence that 
the system is near a critical point, but one might be wor- 
ried that this is an artifact of the model or of the fitting 
procedure. As we have seen in section [iTj the critical 
properties of the distribution P{(t) can be also explored 
directly, without recourse to the maximum entropy ap- 
proximation, by plotting the probability of firing pat- 
terns versus their rank. Figure [3j which shows such plots 
for increasing network sizes, reveals good agreement with 
Zipf's law, especially for larger N. 

Some of the inferred couplings were negative, indi- 
cating an effective mutual inhibition between two cells. 
We know from spin glass theory [?5] that negative cou- 
plings can lead to frustration and the emergence of 
many locally stable, or metastable, states. Formally, 
a metastable state is defined as a state whose energy 
is lower than any of its adjacent states, where adja- 
cency is defined by single spin fiips. Said differently, 
metastable states are local "peaks" in the probability 
landscape. In the retina responding to natural movies, 
up to four metastable states were reported in the popu- 
lation {N = 40). These states appeared at precise times 
of the repeated movie [20], suggesting that they might 
code for specific stimulus features. The synthetic net- 
work of = 120 cells displayed a much larger number 
of metastable states, and the distribution over the basins 



corresponding to these states also followed Zipf's law. At 
this point however, the exact relation between the pro- 
liferation of metastable states and criticality is still not 
well understood. 

In summary, these analyses give strong support to the 
idea that neural networks might be poised near a critical 
state. However, it is still not clear whether the observed 
signatures of criticality will hold for larger A^, especially 
when it is of the order of a correlated patch (~ 200) . The 
next generation of retinal experiments, which will record 
from fa 100 — 200 cells simultaneously, should be able to 
settle that question. 



V. ENSEMBLES OF SEQUENCES 

The structure and function of proteins is determined 
by their amino acid sequence, but we have made rela- 
tively little progress in understanding the nature of this 
mapping; indeed, to solve this problem completely would 
be equivalent to solving the protein folding problem |49l - 
I51j . An oblique way to tackle that question is to remark 
that a single function or structure often is realized by 
many different protein sequences. Can we use the statis- 
tics of these related proteins to understand how physical 
interactions constrain sequences through selection? 

To make progress, one first needs to define protein 
families. Since only a fraction of known proteins have 
a resolved structure or identified function, defining these 
families must rely on simplifying assumptions. The stan- 
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dard method for constructing a family is to start from 
a few well identified proteins or protein domains with a 
common structure or function j32|- A hidden Markov 
model is then inferred from that small pool of sequences, 
and used to scan huge protein databases to search for 
new members. Clearly, this method only works if the 
model can set a sharp boundary between members and 
non-members, and an implicit hypothesis underlying the 
whole approach is that families are indeed well separated 
from each other. 

Once a protein family has been defined, it is interest- 
ing to study its statistical properties. The data on a 
particular family consists of a multiple sequence align- 
ment, so that for each member of the family we have a 
string cr = (ci, . . . , crjv), where N is the number of amino 
acids in the protein and is one of the 20 possible amino 
acids at position i in the alignment, or alternatively an 
alignment gap cf. Fig. |4]A. It is useful to think of 
the family as a probabilistic object, described by a dis- 
tribution P(<t) from which sequences are drawn. As for 
networks of neurons, sampling P{cr) exhaustively is im- 
possible, so one must have recourse to approximations. 

Models of independent residues, Pi{cr) = YliLi Pi{<^i) , 
have been widely used in the literature. Physically, how- 
ever, residues do not simply contribute to the free en- 
ergy additively |53| . emphasizing the importance of cor- 
relations. Indeed, statistical analyses of protein families 
reveal strong correlations among the amino acid substi- 
tutions at different residue positions in short protein do- 
mains [SI]. To illustrate this, we represent in Fig. |4|3 
the mutual information between all pairs of positions in 
a multiple sequence alignment the "WW domain" family 
of proteins. WW domains are 30 amino acid long protein 
regions present in many unrelated proteins. They fold as 
stable, triple stranded beta sheets and bind proline rich 
peptide motifs. The mutual information gives a measure 
of correlations between non-numerical variables — here, 
the residue identity at given positions — defined by 



P»j(g»,crj-) 



(29) 



for a pair of positions i and j in the alignment, where 



Pij{a„aj)^ Y Pier), 

{o-k}k^i,j 



(30) 
(31) 



are the one and two point marginals of the distribution, 
respectively. Some pairs have as much as 1 bit of mutual 
information among them, which means that one residue 
can inform a binary decision about the other. 

How important are these correlations for specifying 
the fold and function of proteins? In a groundbreaking 
pair of papers |55l I56j , Ranganathan and his collabora- 
tors showed that random libraries of sequences consis- 
tent with pairwise correlations of WW domains repro- 
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FIG. 4. Network of correlations between residue positions in 
the protein family of WW domains. A. A protein sequence is 
a string cr of amino acids in the multiple sequence alignment 
of the family. Here is shown a small sample of co-aligned 
sequences. B. The mutual information between amino acid 
positions reveals a tighly connected network of correlations 
between residues all across the sequence. 



duced the functional properties of their native counter- 
part with high frequency. In contrast, sequences that 
were drawn from an independent distribution failed to 
fold. Technically, a random library consistent with pair- 
wise correlations was constructed using a simulated an- 
nealing procedure. The algorithm started from the native 
library and randomly permuted residues within columns 
of the multiple sequence alignment, thereby leaving the 
one point functions Pi{(Ji) unchanged. The Metropolis 
rejection rate was designed to constrain the two point 
functions pij{ai,aj): a cost was defined to measure the 
total difference between the correlation functions of the 
native and artificial libraries: 



C 



E 



log 



p. 



itificial 



{a, a') 



(32) 



and moves were accepted with probability e^'^'^/"^, where 
the algorithm temperature T was exponentially cooled to 
zero until convergence. 
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In spirit, this procedure seems similar to the maximum 
entropy principle: random changes make the library as 
random as possible, but with the constraint that the one 
and two point functions match those of the native library. 
That intuition was formalized in Ref |57j . where the two 
approaches were shown to be mathematically equivalent. 
However, to this day no explicit model for the maximum 
entropy distribution of the WW domains has been con- 
structed. 

The results from [55l |56] generated a lot of interest, 
and since then several studies have tried to explore the 
collective properties of proteins using similar ideas. We 
now review three of these recent efforts [53 EHl [iH] • All 
of these examples support the utility of maximum en- 
tropy methods in drawing meaningful conclusions about 
sequence families, while the last focuses our attention 
back on the question of criticality in these ensembles. 

"Two component signaling" is a ubiquitous system for 
the detection and transduction of environmental cues in 
bacteria. It consists of a pair of cognate proteins, a sensor 
histidine kinase (SK) which detects cellular and environ- 
mental signals, and a response regulator (RR) to which 
signal is communicated by SK via the transfer of a phos- 
phoryl group; the activated RR then triggers other bio- 
chemical processes in the cell, including in many cases the 
expression of other proteins. Many different versions of 
the two component system are present within and across 
species, with about 10 per genome on average. A natu- 
ral question about this system is how the specificity of 
coupling between particular SK and RR proteins is deter- 
mined, especially when the different family members have 
so much in common. To approach this problem, Weigt 
et al. studied a large collection of cognate SK/RR pairs, 
and built a maximum entropy model for the (joint) vari- 
ations in sequence |381 159] . The maximum entropy dis- 
tribution consistent with two point correlation functions 
PijicTi, aj) takes the form of a disordered Potts model: 



^0.06 
Q 

0.04 
0.02 
0, 



I 298.14 
T294,14 . 



I ^272,14 
k 272,18 



\.2m£3lfl.X-*- 

■ 268,15W *^68,18 



0.2 
Ml 




FIG. 5. The maximum entropy distinguishes between cor- 
relations arising from direct pairwise interactions, and cor- 
relations arising from collective effects [38]. A. The mutual 
Information ( |29[ ) between pairs of amino acid position is plot- 
ted versus the Direct information ( 34 1 , which measures the 



mutual information directly contributed by the pairwise in- 
teraction. Among highly correlated pairs, one distinguishes 
between strongly interacting pairs (red area) and pairs whose 
correlations result from collective effects (green area) B. Di- 
rect interactions dominate in the binding domain, while col- 
lectively induced correlations are mostly present in the phos- 
photransfer site. 



P{(t) = _e2:.'''('^-)+5:>, J..(<^..'-.)^ (33) 

z 

with the gauge constraints X^o-^iC'^) = and 
Scr' Jij{^^ f ') = So- Jiji^i '^') = 0- The distribution was 
approximately fitted to the data using mean field tech- 
niques [37ll59]. 

A key point, familiar from statistical mechanics, is that 
a relatively sparse set of interactions Jij can generate 
widespread correlations. It seems plausible that amino 
acids on the SK and RR proteins which govern the speci- 
ficity of their contact actually have to interact in the 
sense of the Potts model, while other residues may be- 
come correlated even if they don't have this essential role 
in specificity. The maximum entropy method allows for 
the distinction of the two cases. A 'Direct Information' 
(DI) was defined as the mutual information between two 
residues when all other residues are ignored: 

DI„- = MI[4--*] (34) 



where in 

pfr=*(a,a') = (35) 

the 'fields' h)" and /i ■ ' are chosen such that 

E.4"^*(o-,a') =p,(a') and Ta'Pfr'^^^^')^P^(^)- 
This direct information, which is zero only for Jij{-,-) — 
0, can be viewed as an effective measure of the interaction 
strength between two residues. Fig. [5] shows direct infor- 
mation versus mutual information for all pairs of residue 
positions in the protein complex. Direct pairwise interac- 
tions (large DI, large MI, red) were found to dominate in 
the binding domain. In contrast, collective effects arising 
from many weak interactions (low DI, large MI, green) 
characterized the phosphotransfer domain. Quite natu- 
rally, strong interactions (large DI, or equivalently large 
Jij's) were hypothesized to correspond to direct contact 
between residues that play a key role in the determina- 
tion of specificity. 
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To validate the connection between specificity and the 
Jij, the inferred interacting residue pairs were used to 
predict the structure of the transient complex formed 
by the two proteins upon binding. The prediction was 
shown to agree within crystal resolution accuracy with 
existing crystallographic data [60]. However efficient, this 
use of the method only focuses on the strongly interact- 
ing pairs involvled in binding, leaving out collective (and 
possibly critical) behaviors present in the phosphotrans- 
fer domain, where strong correlations arise from weak 
but distributed interactions. It would be interesting to 
explore the collective properties of the network as a whole 
through a more systematic study of the model's thermo- 
dynamic properties. 

In a parrallel effort, Halabi et al. ^SS] showed that vari- 
ability in protein families could be decomposed into a few 
collective modes of variations involving non-overlapping 
groups of residues, called 'sectors', which are function- 
ally and historically independent. To find these sectors, 
an estimator of the correlation strength was defined as: 

= D,D, -p,(ar")p,«"^)| , 

(36) 

where a^°^^ is the consensus sequence made of the most 
common residues at each position. The role of the 
weights 

^[l-K«°-)]g(a--)' ^^^> 

where q{'j) is the background probability of residues in 
all proteins, is to give more importance to highly con- 
served positions. The matrix Cij was diagonalized, and 
the projection of each position i onto the second, third 
and fourth largest eigenmodes (the first mode being dis- 
carded because attributed to historical effets) was rep- 
resented in a three dimensional space. In that space, 
which concentrates the main directions of evolutionary 
variation, residue positions can easily be clustered into a 
few groups, called sectors. 

This approach was applied to the SI A serine protease 
family, for which three sectors were found (Fig. [6]^). Re- 
markably, two of these sectors are related to two distinct 
biochemical properties of the protein, namely its thermal 
stability and catalytic power, and experiments showed 
that mutations in each sector affected the two properties 
independently (Fig. [6j3). The mutants used for these ex- 
periments were randomly generated by an Ising model 
identical to ( p7| ) for each sector. The model was fitted 
to the data after sequences cr were simplified to binary 
strings cr, with (t,; = (^((7^, cr™"^). Although no system- 
atic study of the many body properties of this model was 
carried out, the non-additive effect of mutations on the 
protein's properties was demonstrated by experiments on 
double mutants. 

Finally, in recent work, the maximum entropy ap- 
proach was used to study the diversity of an unambigu- 
ously defined family of proteins: the repertoire of B cell 
receptors in a single individual |33| . B cells are compo- 
nents of the immune system; each indivudual has many 
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FIG. 6. Independent sectors in the SIA serine protease family 
[58] . A. Residue positions i are plotted in the eigenspace of 
the weighted correlation matrix dj. Three clusters of posi- 
tions, called sectors, emerge (blue, red and green). B. Mu- 
tations in different sectors result in independent changes in 
the biochemical properties of the protein. Mutations in the 
red sector (red dots) affect the catalytic power (x axis) , while 
mutations in the blue sector (blue dots) change the thermal 
stability Tm (y axis). 



B cells, each of which expresses its own specific surface 
receptor (an antibody) whose task is to recognize anti- 
gens. Thus, the diversity of the repertoire of B cell re- 
ceptors carries an important biological function, as it sets 
the range of pathogens against which the organism can 
defend itself. The mechanisms by which diversity is gen- 
erated in the repertoire are complex and not entirely elu- 
cidated ini]- Recently, Weinstein et al. have sequenced 
almost exhaustively the repertoire of B cell receptors of 
single zebrafish [T^], allowing for the first time for a de- 
tailed analysis of repertoire diversity. 

A main source of the diversity is generated through a 
process called recombination, which pieces together dif- 
ferent segments of the antibody sequence (called V, D 
and J segments) , each of which is encoded in the genome 
in several versions. Additional diversity is generated at 
the VD and DJ junctions by random addition and re- 
moval of nucleotides during recombination. Finally, an- 
tibody sequences undergo random somatic hypermuta- 
tions, mostly in and around the D segment, throughout 
the lifetime of the cell. Thus, most of the diversity is 
concentrated around the D segments, which also consti- 
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FIG. 7. A translation invariant maximum entropy model of 
non-aligned sequences correctly predicts amino acid frequen- 
cies at absolute positions [35]. Top: the sequence is made 
of three segments, V D and J, of which on D and its flank- 
ing junctions are fitted by a translation invariant maximum 
entropy model. Left: for each position i from the left, the 
frequency table for all 20 residues is represented by a 
histogram. Right: comparison of these frequencies between 
data and model prediction (after rescaling by the translation- 
invariant independent model P^{ai)). 



FIG. 8. The repertoire of antibody D regions of zebrafish 
follows Zipf's law [33] ■ For a single fish, the probability of 
a small antibody segment involved in pathogen recognition 
is plotted versus its frequency rank, as in Fig. [3] The data 
(cyan) is compared with the prediction of a maximum entropy 
model consistent with nearest and next nearest neighbors cor- 
relations (red), and also with a model of independent residues 
(green). Inset: the same curve plotted for multiple individu- 
als. 



tute one of the three main loops involved in the pathogen 
recognition process. The D region (defined as the D seg- 
ment plus its flanking junctions) is therefore a excellent 
place to study repertoire diversity. 

Compared to the previous cases, the definition of the 
family here is straightforward: all D region sequences of a 
single individual. However, and in contrast to other pro- 
tein families, D sequences cannot be aligned, and their 
length varies considerably (from to 8 amino acids). To 
circumvent this problem, a maximum entropy distribu- 
tion consistent with translation invariant observables was 



defined. This leads to writing a model similar to Eq ( 33 1 , 
but where hi = h and Jij = Jk={i-j) do not depend on 
the absolute position of the residues along the sequence. 
In addition, in order to account for the variable length, 
the length distribution itself was added to the list of fit- 
ted observables, resulting in a chemical potential /i[L(cr)] 
being added to the Potts energy, where L{cr) is the se- 
quence length. 

The model was fitted by gradient descent combined 
with Monte Carlo simulations. Pairwise correlation be- 
tween nearest and second nearest neighbors alone ex- 
plained 70 to 90% of correlations, contributing to a large 
drop in entropy compared to the independent model, 
from 15 to 9 bits on average. Thus, correlations lim- 
ited the size of the repertoire by a ^ 2® = 64 fold factor. 
Despite it being translation invariant, the model could 
also reproduce local observables by simple end effects. 



such as the > lOx variation in amino acid frequencies at 
given absolute positions, as shown in Fig.jTj 

One striking prediction of the model is that the reper- 
toire follows Zipf's law, in close analogy to results ob- 
tained for the activity of neural networks. Since the ex- 
haustive sampling of P{(t) is possible in this case, that 
prediction can be directly tested against the data, and 
was found to be in excellent agreement (Fig. [8|. Im- 
portantly, pairwise correlations between residues are es- 
sential for explaining this behavior, as evidenced by the 
failure of the independent model to reproduce it. The 
law also seems to be universal, as its varies little from in- 
dividual to individual, despite substantial differences in 
the details of their repertoires. 

In addition, the model was used to look for metastable 
states, performing a similar analysis as was done for 
the retina in the previous section. About ten relevant 
metastable states were found for each individual. Not 
all these states could be mapped onto a genomic tem- 
plate, and it was hypothesized that these non-templatcd 
states might reflect the history of antigenic stimulation 
and thus "code" for an efficient defense against future 
infections. Furthermore, continuous mutation paths ex- 
isted between almost all metastable states, showing that 
the repertoire efficiently covers gaps between metastable 
states, and emphasizing the surprising plasticity of the 
repertoire. 

These results suggest that correlations in protein fam- 
ilies build up to create strongly correlated, near-critical 



12 



states. A practical consequence for protein diversity is 
that collective effects limit the space of functional pro- 
teins much more dramatically than previously thought. 
This should invite us to revisit previously studied fami- 
lies (WW domains, SK/RR pairs, serine proteases, but 
also PDZ, SH2, and SH3 domains) to investigate their 
thermodynamical properties, with the help of maximum 
entropy models, in search of critical signatures. 



VI. FLOCKS OF BIRDS 



full velocities velocity fluctuations 




Groups of animals such as schooling fish, swarming 
insects or flocking birds move with fascinating coordina- 
tion 62J. Rather than being dictated by a leader or in 
response to a common stimulus, the collective patterns of 
flock dynamics tend to be self organized, and arise from 
local interactions between individuals, which propagate 
information through the whole group. Flocks, schools 
and swarms also are highly responsive and cohesive in 
the face of predatory threat. This balance between order 
and high susceptibility points to the idea of criticality. 
Recent field work and theoretical analysis pioneered by 
the STARFLAG team HMTT] (see also W\ for a review in 
relation to previous models) , has framed this idea in pre- 
cise mathematical terms, culminating in the first empir- 
ical evidence that flock behaviour may indeed be critical 
in the sense of statistical physics [S3] . Before embarking 
on the description of these results, we first review the 
technical advances that have made these developments 
possible. 

Three dimensional studies of flocks were pioneered by 
Cullen et al. [65] . Until recently, such experiments have 
focused on small populations of a few tens of individuals, 
which is insufficient to investigate the large scale prop- 
erties of flocks. The accurate reconstruction of the three 
dimensional positions of large flocks is impeded by many 
technical challenges and has been a major bottleneck. In 
principle, one can infer the three dimensional coordinates 
of any object from two photographs taken simultaneously 
from different viewpoints. But in the presence of a large 
number of indistinguishable birds, individuals first need 
to be identified between photographs before that simple 
geometric argument can be used; this is the so-called 
matching problem. Use of three cameras can help, but 
in the presence of noise the matching problem is still 
highly challenging. In Ref [15] . new techniques were de- 
veloped to aid the resolution of the matching problem. 
The main idea is to compare the patterns formed by the 
immediate neighborhood of each individual between dif- 
ferent photographs. The best match is then chosen as 
the one maximizing the overlap between these patterns 
in the different photographs. 

With the help of this technique, triplets of carefully cal- 
ibrated, high resolution photographs of flocks of starlings 
taken from three different viewpoints were processed and 
analysed to yield accurate positions and velocities for all 
the individuals of flocks comprising up to 2700 birds; see 



FIG. 9. Two dimensional projection of a typical 3D recon- 
struction of the positions and velocities of every bird in a 
flock of 1,246 starlings [64]. Left: the absolute velocities Vi 
show a high degree of order in bird orientation. Right: the 
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Fig. [9] for an example. Preliminary analysis focused on 
the overall size, shape, density, homogeneity and flying 
direction of entire flocks [T31 [H] . A subsequent study [T7] 
demonstrated that birds interact with their neighbors ac- 
cording to their topological distance (measured in units 
of average bird separation), rather than to their metric 
distance (measured in units of length). The reasoning 
leading to that conclusion is quite indirect and is worth 
explaining in some detail. The distribution of neighbors 
around an average bird is not uniform: birds tend to have 
closer neighbors on their sides than behind or in front of 
them. There are biological reasons for this. Birds have 
lateral vision, and can monitor their lateral neighbors 
with better accuracy. In addition, keeping a larger dis- 
tance with frontal neighbors may be a good strategy for 
avoiding collisions. The main assumption of |17| is that 
this heterogeneity is a result of interactions between in- 
dividuals, and can be used to estimate the range of these 
interactions, defined as the distance at which the neigh- 
borhood of an average bird becomes uniform. Plotting 
this range for various flock densities both in topologi- 
cal and metric units (Fig. 10) clearly showed that birds 
interact with a fixed number (~ 7) of neighbors rather 
than with birds within a fixed radius as was previously 
thought. 

How does global order emerge across the whole flock 
from local interactions? Clearly, if each bird perfectly 
mimics its neighbors, then a preferred orientation will 
propagate without errors through the flock, which will 
align along that direction. In reality, alignment with 
neighbors is not perfect, and noise could impede the 
emergence of global order. This situation is similar to 
that encountered in physics, where increasing the tem- 
perature destroys the ordered state (melting). Consider 
for example a uniform, fully connected Ising model — the 
simplest model of ferromagnetism — defined by Eq. (27) 
with J,;,, = J/N and hi = h. At equilibrium, its 



mean magnetization m = ^'^i{<^i) = satisfies m 
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FIG. 10. Topological versus metric: flocking birds interact 
with a finite and fixed number of neighbors |17| . The in- 
teraction range is plotted in terms of number of interacting 
neighbors ric (left) and in terms of the metric distance rc 
(right), as a function of the sparseness ri, defined as the av- 
erage separation between neighbors. The topological range 
ric ~ 7 is invariant while the metric range rc scales with the 
linear sparseness. 



tanh( Jto + h) [B^. Under a small field /i = 0+, the sys- 
tem is completely disordered (m = 0) when the control 
paramater J (inverse temperature) is smaller than 1, but 
becomes ordered (to > 0) for J > 1. Interestingly, a 
similar phase transition occurs in simple models of flock 
dynamics [67] . where the external control parameter can 
be the noise, the flock density, or the size of the align- 
ment zone. This phase transition, and the concomittent 
spontaneous symmetry breaking, were analyzed analyti- 
cally in a continuum dynamical model which exactly re- 
duced to the XY model in the limit of vanishing velocities 

MM- 

Order is not exclusive to self organized systems, and 
can instead result from an external forcing (in language 
appropriate to flocks, by a leader or a shared environmen- 
tal stimulus). In the Ising model, this corresponds for 
example to J = and h ^ 1. To better discriminate be- 
tween self-organization and global forcing, one can exam- 
ine the response function of the system, or equivalently 
(by virtue of the fluctuation-dissipation theorem) the cor- 
relation functions of small local fluctuations around the 
ordered state. In the context of flocks, a large response 
function means that the flock is not only ordered, but 
also responds collectively to external perturbations. It is 
tempting to suggest that this property is desirable from 
an evolutionary point of view, as it implies a stronger 
responsiveness of the group to predatory attacks. We 
will see that this is indeed how flocks of birds behave. 
Note that in physical systems, high susceptibility is only 
achieved near a critical point. In the disordered phase, 
variables are essentially independent from each other, 
while in the ordered phase, variables are aligned but their 
fluctuations become independent as the temperature is 
lowered. What is the situation for bird flocks? 

To explore these ideas empirically, Cavagna et al. [64] 
analyzed the velocity correlations of large flocks, using 
the same dataset as in previous studies. At this point it 
should be stressed that here, at variance with the pre- 
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FIG. 11. Velocity fluctuations are scale free [64]. A.. The 
correlation length ^ scales linearly with the system's size L, 
indicating that no other scale than L is present in the system. 
B. Correlation function C versus rescaled distance r/^. ^ is 
defined as the radius for which C — 0. The slope at r = ^ 
(Inset) seems to depend only weakly upon ^. This suggests 
that coherence can in principle be preserved over extremely 
long ranges. 



vious cases of neurons and proteins, learning the prob- 
ability distribution of the system's state is impractical 
because only one example of the flock's state is avail- 
able to us. On the other hand, translation invariance (if 
one excludes the edges of the flock) and homogenpeity 
in the birds' behavior can be invoked to make statisti- 
cal statements across the population. Let us call Vi the 
3D velocity vector of a bird i — 1, . . . ,N. The amount 
of order in the flock is typically measured by the po- 
larization II ;^ X^i uf^ II I whose value is here very close 
to 1 (0.96 ± 0.03) in agreement with previous studies. 
But as discussed earlier, more interesting are the fluc- 
tuations around the global orientation, defined by the 
velocities in the reference frame of the center of mass: 
Ui = Vi — (1/A^) ^i- Correlations in these fluctua- 
tions are captured by the distance dependent correlation 
function: 



C{r) 



1 Y.^.1^^■Uj5{r^r,j) 



Co 



(38) 



where is the distance between birds i and j, 5{-) is a 
(smoothed) Dirac delta function, and cq is chosen such 
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that C(r = 0) = 1. The correlation function C(r) is 
plotted in Fig. [TT| \ for different flock sizes as a function 
of the rescaled distance r/^, where ^ is a characteristic 
length defined by C(^) = 0. All points seem to fall onto 
a single curve. 

The results of Fig. [TT] A. are consistent with what we 
know from scaling theory in physics [66j. Near a criti- 
cal point, correlation functions are given by a universal 
function, 



7-7 



(39) 



where ^ is the correlation length which diverges as the 
critical point is approached. Strikingly, in bird flocks, the 
correlation length ^ is found to scale with the linear size of 
the flock L (Fig.[TT)3). This indicates that the correlation 
function is in fact scale free, in the sense that no scale 
is present except for the system size. Replacing ^ = aL 
into (391 and taking L — > oo yields a power law decay 
for the correlation function, C(r) — r^^ , characteristic 
of a critical point. The exponent 7 can in principle be 
evaluated from data through the derivative of C at r = ^: 
^dC/dr cx However, as evident from the inset of 

Fig. im ^V, 7 is almost indistinguishable from zero. This 
implies that the correlation function is not only scale free, 
but also decays very slowly, implying extremely strong 
and long ranged coherence across the flock. 

The same analysis was carried out on the correlations 
of the modulus of the velocity, rather than its orienta- 
tion, yielding essentially the same restults. A physical 
system with a spontaneously broken symmetry, such as 
its overall orientation, can display scale free ( "massless" ) 
behavior of the quantity associated to that symmetry, 
even when no critical point is present (Goldstone modes). 
However, the modulus of velocity is a much stiffer mode 
than velocity orientation, and corresponds to no obvious 
symmetry. The fact that it also exhibits scale free be- 
havior thus is stronger evidence that the system indeed 
is close to a critical point. 

One must be cautious when extrapolating from fi- 
nite system sizes, and conclusions drawn from these ex- 
trapolations must be examined with increased scrutiny. 
Nonetheless, the wealth of evidence in favor of criticality 
makes it a very useful and pertinent concept for under- 
standing complex flock dynamics. We expect that con- 
tinued improvements in experimental technique and data 
analysis methods will test the hypothesis of criticality 
much more sharply. 



VII. DYNAMICAL VS. STATISTICAL 
CRITICALITY 

So far, we have assumed that states of a biological sys- 
tem were drawn from a stationary probability distribu- 
tion P((t), and we have explored questions of criticality 
in the associated statistical mechanics model. Criticality, 
however, can also be meant as a dynamical concept. For 



example, in models of self-organized criticality mentioned 
in the introduction, avalanches are by nature a dynam- 
ical phenomenon [T]. We now discuss two lines of work 
in this direction: the observation of critical avalanches 
of activity in networks of cultured neurons, and dynami- 
cal criticality close to a Hopf bifurcation in the auditory 
system. 

We start with avalanches in neural networks [7nU72| . 
Consider a control parameter for neuronal excitability, 
which sets how much a spike in one neuron excites its 
neighbors. If this parameter is too low, a spike in one neu- 
ron may propagate to its direct neighbors, but the associ- 
ated wave of activity will quickly go extinct. Conversely, 
if the excitability parameter is too high, the wave will ex- 
plode through the whole population and cause something 
reminiscent of an epileptic seizure. To function efficiently, 
a neural population must therefore poise itself near the 
critical point between these two regimes. The analogy 
with sandpiles and earthquakes is straightforward: when 
a grain falls, it dissipates some its mechanical energy to 
its neighbors, which may fall in response, provoking an 
avalanche of events [I]. A similar argument applies to 
earthquakes and the propagation of slips [ 73] . 

The most striking feature of self-organized criticality 
is the distribution of the avalanche sizes, which typically 
follows a power law. Beggs and Plenz [73] were the first 
to report such power laws in the context of neural net- 
works. In their experiment, a 60-channel multielectrode 
array was used to measure local field potentials (a coarse 
grained measure of neural activity) in cortical cultures 
and acute slices. Activity occured in avalanches — bursts 
of activity lasted for tens of milliseconds and were sep- 
arated by seconds long silent episodes — that propagated 
across the array (Fig. [I2|\_). For each event, the total 
number of electrodes involved was counted as a measure 
of avalanche size. The distribution of this size s followed 
a power-law with an exponent close to —3/2 (Fig 12 3). 



Although that exponent was first speculated to be uni- 
versal, it was later shown that it depends on the details 
of the measurement method [75] . 

The critical properties of neural avalanches can be ex- 
plained by a simple branching process [73] . Assume that 
when a neuron fires at time each of its neighbors has a 
certain probability of firing at time t -I- 1, such that the 
average number of neighbors firing at i + 1 is given by 
the branching parameter /3. That parameter is exactly 
what we called "excitability" earlier; /3 < 1 leads to to an 
exponential decay of the avalanche, /? > 1 to its exponen- 
tial and unlimited growth, and /3 = 1 defines the critical 
point. To support that simple theory, the parameter /3 
was estimated directly from the data, and was found to 
be 1 within error bars. 

Can we connect this notion of criticality in a neural 
network to the ideas discussed in Section|TT]? Consider, for 
example, the simple mean field branching process on an 
infinite tree analyzed in |77j and summarized by Fig. 13 
When p = 1/2 (/3 = 1), one can show, by recursively 
calculating the generating function of the avalanche size 



15 



□□□□□□□ 



time (4 ms frames) - 



ft. 



data 

poisson assumption 




10° 



avalanche size, S 



102 



FIG. 12. The distribution of avalanche sizes follows a power 
law. A. Sample avalanche propagating on the 8x8 multi- 
electrode array. B. Probability distribution of avalanche sizes 
(measured in number of electrode) in log-log space. The dis- 
tribution follows a power-law with a cutoff set by the size of 
the array. 




FIG. 13. A simple branching process on a tree [77] • Start- 
ing from the root, activity propagates to its two descendants 
with probability p — 1/2, or to none with probability 1 — p. 
The process repeats itself for all active descendants. In this 
example black node are active, while white node are inactive. 
The size of the avalanche is s = 7. 



s, that the distribution of avalanche sizes becomes 
P{s > 1) . 



(40) 



Although the resemblance of the exponent 3/2 to that 
found in |74| is coincidental, this simple process nonethe- 
less predicts a power law in the avalanche size. Similar 
models defined on lattices or on completely connected 
graphs were proposed to explore the functional proper- 
ties of neural avalanches I74,;78j29j. When p = 1/2, the 
probability of any particular avalanche event <t is easy to 
estimate, and is 2"'', where s is the size of the avalanche; 
note that there are many states cr that correspond to the 
same size s. Using our definition of the "energy" from 
Eq. we have E{(t) = slog(2). By virtue of Eq. ( [40| , 



however, in this dynamically critical state the probabil- 
ity that a random configuration has energy E decays less 
rapidly than an exponential, and this must result from a 
near perfect balance between energy and entropy: 



1 



P{E) -e 



Z (log 2)3/2 ' 



which implies: 



S{E) ^E~- log(i?) 



(41) 



(42) 



and this is (for large E) Zipf 's law once again. Note that 
this result is driven solely by the fact that the distri- 
bution of avalanche sizes has a long tail, and not by any 
specific power law behaviour. To summarize, in the space 
of avalanche configurations we have the same signature 
of criticality that we have seen in the retina (Figs. [2] and 
[s]), although in different tissues, with different measure- 
ment methods, and assuming different models of activity. 
This emphasizes the potential generality of Zipf 's law and 
criticality for brain function. 

The space of possible avalanches is huge, and one might 
wonder whether avalanches can serve as a basis for a neu- 
ral code. In a simple branching process, each avalanche of 
a given length occurs completly at random and is as prob- 
able as any other. But in a real network, with disordered 
excitabilities and branching parameters, some types of 
avalanches may be more likely than others, forming at- 
tractors in avalanche space. Such attractors were de- 
tected in the experimental data by clustering all observed 
avalanche patterns [50] . Remarkably, simulations of dis- 
ordered branching processes show that a large number 
of attractors is only possible when the system is close to 
the critical point (average branching parameter 1) |79j . 
These results are reminiscent of those found in the retina, 
with the difference that attractors are now defined in a 
dynamical space rather than as metastable states in the 
space of configurations. As in the retina, the exact func- 
tion of these attractors for coding is still elusive. 

We now turn to another example where dynamical crit- 
icality plays an important role, although in a different 
way, in the context of the auditory system. Our ear 
is remarkably sensitivive to weak sounds, responding to 
motions of the same magnitude as thermal noise. As 
early as 1948, Gold jSlJ proposed that this sensitivity 
is achieved by compensating damping through an active 
process. Several observations support this hypothesis. 
Hair cells, which convert mechanical movement into elec- 
trical current, respond to sounds in a highly non-linear 
manner, by strongly amplifying low amplitude stimuli at 
some frequency that is characteristic of each cell. In ad- 
dition, hair cells display small spontaneous oscillations 
even in the absence of stimulus. Most dramatically, the 
ear can actually emit sounds, spontaneously, presumably 
as the result of damping being (pathologically) over- 
compensated at some points in the inner ear [821 183) . 
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A series of recent works, both theoretical and experi- 
mental, have shown that the mechanosensing system of 
hair cells is tuned close to a Hopf bifurcation, where the 
system is highly sensitive to stimulation (see [ST for a 
recent review) . Before going into the specifics of hair cell 
biophysics, let us first explain the basic idea. 

A Hopf oscillator is described by two essential dynam- 
ical variables, often collected into a single complex num- 
ber z. Hopf 's oscillators form a universality class of dy- 
namical systems, and in response to small forces near the 
bifurcation point, the dynamical equations can always be 
written as 



Response relations 



dz 



(43) 



In the absence of forcing, F = 0, self-sustained oscilla- 
tions appear for /i > 0: z = re'"''*, with r — ^Ji. When 
a stimulus is applied at the resonant frequency wqi the 
system simplifies to 



z = re'""*. 



dr 
di 



F. 



(44) 



Precisely at the bifurcation point /i = 0, there is no 
regime of linear response; instead we have r = F^/^. The 
key point is that the "gain" of the system, r/F = F^^^^, 
diverges at small amplitudes, providing high sensitivity 
to weak forcings (Fig. 14). This very high gain does not 



extend much beyond the resonant frequency ujq'- it drops 
already by half from its peak when jw— wqI — 3-\/7F^/"^/4. 

How does this theory fit into what we know about the 
auditory system? As we have seen, an active process 
is necessary for amplification. In hair cells, this active 
process is provided by hair bundle motility powered by 
molecular motors, which causes spontaneous oscillations 
at a characteristic frequency that depends on the geome- 
try of the hair bundle. Hence each cell will be highly 
selective for one particular frequency. Signal is tran- 
duced by the opening of channels upon deflection of the 
hair bundle, which has the effect of depolarizing the cell. 
The interplay of hair bundle motility and external forc- 
ing provides the basic ingredients for an excitable Hopf 
oscillator. The relevance of Hopf's bifurcation in hair 
cells was suggested in ^ , and its consequences in terms 
of signal processing was explored in |87| . In a parallel 
efi^ort [8 8) , an explanation was proposed for how the sys- 
tem tunes itself near the critical point in the oscillating 
regime {fi = 0"*"). The idea is that feedback is provided by 
the activity of the channels themselves, notably through 
the calcium ion concentration C which controls the ac- 
tivity of the motors responsible for bundle motility. At 
first approximation one can write C = C{fi). Channel 
activity regulates C through 



dC/dt^~C/T + J{x), 



(45) 



where r is the relaxation time, x — Re(z) the hair bun- 
dle displacement, and J{x) the displacement-dependent 
ion flux. For an oscillatory input x = rcos{ujt), J(r) — 
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FIG. 14. Response to an oscillatory input of a Hopf oscillator 
near its critical point 84 . A. Response (displacement) as a 
function of input frequency, for increasing input amplitudes 
from dB (lower curve) to 80 dB (top curve). This plot 
emphasizes the amplification of small inputs, as well as the 
shrinking width of the frequency range where amplification is 
present. B. Displacement as a function of simulus amplitude, 
plotted in log space. The red curve, of slope 1/3, shows the 
enhanced response at the critical (resonant) frequency. For 
other frequencies (whose color correspond to the frequencies 
marked by lines in A), the response is linear. 



( J(x)) is an increasing function of r (assuming that J{x) 
is convex). Thus, the non-oscillatory part of C will tune 
itself at a value such that C(/i) = Tj(r) = tJ(y^). 
One can show that, for relevant physical parameters, this 
drives the system to small values of /i, that is, close to 
the bifurcation point. 

Experiments were able to confirm this picture quanti- 
tatively by measuring the voltage response of frog hair 
cells to an input current. The results showed an en- 
hanced gain for small amplitudes at the resonance fre- 
quency (Fig. 15), as predicted by the theory. There are 
also classical experiments in auditory perception that are 
explained by the Hopf scenario. In particular, in the 
presence of two tones at frequencies fi and /2 , we hear a 
combination tone at frequency 2/i — /2 , but the apparent 
intensity of this sound scales linearly with the intensity 
of the primary tones. This is completely inconsistent 
with a system that has a linear response and perturba- 
tive nonlinear corrections, but agrees with the 1/3 power 
response at the critical point. 



Again, we can ask how Hopf bifurcation relates to the 
equilibrium notion of criticality we have explored before. 
If we examine the equation governing the evolution of 
the amplitude r as a function of time, Eq (44), we can 



formally rewrite it as the overdamped motion of a coor- 
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FIG. 15. Experimental evidence of a Hopf bifurcation in hair 
cells [SS]. Shown is the membrane potential as a function 
of the input current for different input frequencies. Two top 
curves: an input current oscillating at the resonant frequency 
(126 Hz) is amplified in a non linear way. Bottom curve: the 
relation becomes linear when the input frequency is 20 Hz 
above the resonant frequency. 



dinate r in a potential U: 

dr dU , r"^ ^ 

dt-~l^^ Uir)-9^-,^-Fr, (46) 

with g — 1. The form of U is familiar: it describes 
Landau's theory of second order phase transitions. In 
Landau theory, /i is a function of the model parameters 
(notably the temperature), and vanishes at the critical 
point. One might object that this dynamical model is 
not really a many body system, and so can't have a true 
phase transition. But in all ears, and especially in the 
mammalian cochlea, there are many hair cells, tuned to 
different frequencies, and they are mechanically coupled 
to one another. Maximal amplification at each frequency 
thus requires that the whole system be set such that a 
macroscopic fraction of the dynamical degrees of freedom 
are at criticality fSnHHS- Presumably this interacting sys- 
tem should exhibit departures from mean field or Landau 
behavior, although this has not been explored. 



VIII. LOOKING AHEAD 

We write at a fortunate moment in the development 
of our subject, when experiments are emerging that hold 
the promise of connecting decades of theoretical discus- 
sion to the real phenomena of life, on many scales. We 
hope to have conveyed our reasons for thinking that this 
is a remarkable development, but also to have conveyed 
the challenges inherent in this attempt to bring theory 
and experiment into more meaningful dialogue. 

The first challenge is that we do have somewhat differ- 
ent notions of criticality in different systems, even at the 
level of theory, and these differences are amplified as we 
examine the many different approaches to data analysis. 



This is a deep problem, not necessarily limited to bio- 
logical systems. Except in a few cases, the mathematical 
language that we use to describe criticality in statistical 
systems is quite different from the language that we use 
in dynamical systems. Efforts to understand, for exam- 
ple, current data on networks of neurons will force us to 
address the relations between statistical and dynamical 
criticality more clearly. 

The second major challenge is that using the maxi- 
mum entropy method to analyze real data requires us 
to solve an inverse statistical mechanics problem. This 
problem is tractable far away from critical points, but 
near criticality it seems very difficult. If we had more 
analytic understanding of the problem, it might be pos- 
sible to identify the signatures of criticality more directly 
from the measurable correlations, perhaps even allowing 
us to draw conclusions without explicit construction of 
the underlying model. Absent this understanding, there 
is a serious need for better algorithms. 

A third set of challenges comes from the nature of the 
data itself. While we have celebrated the really revo- 
lutionary changes in the scale and quality of data now 
available, there are limitations. In some cases, such as the 
flocks of birds, we have relatively few independent sam- 
ples of the network state; even if we had access to longer 
time series, the topology of the network is changing as 
individual birds move through the flock, and we would 
be forced back to analyzing the system almost snapshot 
by snapshot. In other cases, such as protein sequences, 
we have access to very large data sets but there are un- 
known biases (the organisms that have been chosen for 
sequencing) . 

A more subtle problem is that, in all cases, the cor- 
relations that we observe have multiple origins, some of 
which are intrinsic to the function of the system and some 
of which reflect external influences. For many of the sys- 
tems we have considered, most of the literature about the 
analysis of correlations has sought to disentangle these 
effects, but this work makes clear that it might not be 
possible to do this without introducing rather detailed 
model assumptions (e.g., about the mechanisms generat- 
ing diversity in the antibody repetoire vs. the dynamics 
of selection in response to antigenic challenge). In the 
case of the retina, we know that, quantitatively, roughly 
half the entropy reduction in the network relative to in- 
dependent neurons is intrinsic, and half arises in response 
to the visual stimulus [19^ , but even the "extrinsic" corre- 
lations are not passively inherited from the outside world, 
since the strength and form of these correlations depends 
on the adaptation state of the underlying neural circuitry. 
If the networks that we observe, reflecting both intrinsic 
and extrinsic effects, operate near a critical point, this 
fact may be more fundamental than the microscopic ori- 
gins of the correlations. 

Hopefully the discussion thus far has struck the correct 
balance, exposing the many pieces of evidence pointing 
toward critical behavior in different systems, but at the 
same time emphasizing that criticality of biological net- 
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works remains a hypothesis whose most compelling tests 
are yet to come. To conclude our review, let's take the ev- 
idence for criticality at face value, and discuss two ques- 
tions which are raised by these observations. 

The first question is why biological systems should be 
nearly critical. What benefits does operation at this spe- 
cial point in parameters pace provide for these systems? 
For birds, we have seen that criticality confers high sus- 
ceptibility to external perturbations, and this enhanced 
reactivity endows them with a better defense mechanim 
against predators. Similarly, in the auditory system, be- 
ing close to a bifurcation point allows for arbitrarily high 
gains and accurate frequency selectivity in response weak 
sounds. 

In neural populations, the naive idea underlying the 
theory of branching processes makes criticality seem al- 
most inevitable — a middle point between death and 
epilepsy. However, the function of neural networks is not 
only to be reactive, but also to carry and process complex 
information in a collective manner through its patterns 
of activity. The observation and analysis of metastable 
states, both in retinal acitivity analyzed within the max- 
imum entropy framework [20] and in the activity of corti- 
cal slices analyzed with the theory of branching processes 
[79] , suggest that criticality may be coupled to the explo- 
sion of these states, allowing for a wider set of coding op- 
tions. A more detailed analysis is needed to support this 
speculation, and to better understand how metastable 
states can be learned and used in practice for efficient 
decoding. More generally, criticality runs counter to sim- 
ple notions of efficiency in neural coding, suggesting that 
other principles may be operating, as discussed in Refs 
[201147]. In the case of immune proteins, criticality could 
be useful for preparedness to attacks, and could result 
from a tight balance between the expected — prior expe- 
rience with antigens, as well as hereditary information en- 
coded in the genomic templates — and the unknown. As 
in the case of neural coding, the existence of metastable 
states and their potential for encoding pathogen history 
may be enhanced by criticality. 

The second question is how criticality can be achieved, 
apparently in so many very different systems. Critical 
systems occupy only a thin region (sometimes even a sin- 
gle point) of the parameter space, and it is not clear how 
biological systems find this region. In some cases, a feed- 
back mechanism can be invoked to explain this adapta- 
tion, as in the case of hair cells, where the active process 
is itself regulated by the amplitude of the oscillations it 
produces. In networks of neurons, synaptic plasticity is 
a good candidate, and there are models that use (more 
or less) known mechanisms of synaptic dynamics to sta- 
bilize a near-critical state [91j. In other cases however. 



no obvious explanation comes to mind. 

Bird flocks display coherence over very large length 
scales, which suggests that the strength of the underly- 
ing interactions (that is, the precision with which each 
bird matches its velocity vector to its neighbors) is tuned 
very precisely, but we have no idea how this tuning could 
be achieved. In the case of the immune system, feed- 
back does not seem a plausible explanation, because the 
immune repertoire is constantly renewing itself and out 
of equilibrium. It is worth noting that a simple mecha- 
nism of exponential growth with introduction of random 
novelty, called the Yule process [S2], predicts Zipf's law. 
However such a model suffers from the same flaws as 
the branching processes mentioned above: the resulting 
states are uniformly random, and cannot carry informa- 
tion about their environment, as one would want from an 
adaptive immune system. Besides, the Yule process does 
not account for the existence of a constant source of ge- 
nomic antibody segments. Therefore, the mechanism by 
which the repertoire maintains criticality remains largely 
elusive and requires further investigation. 

To summarize, we have discussed experimental evi- 
dence of criticality in a wide variety of systems, spanning 
all possible biological scales, from individual proteins to 
whole populations of animals with high cognitive capac- 
ity, in stationary as well as dynamical systems. The wide 
applicability of the concepts exposed here, fueled by an 
increasing amount of high quality data, makes for an 
exciting time. Ideas which once seemed tremendously 
speculative are now emerging, independently, from the 
analysis of real data on many different systems, and the 
common features seen across so many levels of biological 
organization encourage us to think that there really are 
general principles governing the function of these com- 
plex systems. 

ACKNOWLEDGMENTS 

We thank our many collaborators for the pleasure of 
working together on these ideas: D Amodei, MJ Berry 
II, CG Callan, O Marre, M Mezard, SE Palmer, R Ran- 
ganathan, E Schneidman, R Segev, GJ Stephens, S Still, 
G Tkacik, and AM Walczak. In addition, we are grateful 
to our colleagues who have taken time to explain their 
own ideas: A Cavagna, I Giardina, MO Magnasco, and 
M Weigt. Speculations, confusions, and errors, of course, 
remain our fault and not theirs. This work was supported 
in part by NSF Grants PHY-0650617 and PHY-0957573, 
by NIH Grant P50 GM071598, and by the Swartz Foun- 
dation; T. M. was supported in part by the Human Fron- 
tiers Science Program. 



[1] P Bak, C Tang, and K Wiesenfeld, "Self-organized crit- 59, 381-384 (Jul 1987). 

icality: An explanation of the 1/f noise," Phys Rev Lett [2] Per Bak, How Nature Works (Springer, New York, 1996). 



19 



[3] P Bak and K Sneppen, "Punctuated equilibrium and crit- 
icality in a simple model of evolution," Phys Rev Lett 71, 
4083-4086 (Dec 1993). 

[4] Stephen Gould and Niles Eldredge, "Punctuated equi- 
libria: The tempo and mode of evolution reconsidered," 
Paleobiology 3, 115-151 (Apr 1977). 

[5] M Usher, M Stemmler, and Z Olami, "Dynamic pattern 
formation leads to 1/f noise in neural populations," Phys 
Rev Lett 74, 326-329 (Jan 1995). 

[6] J J Hopfield, "Neural networks and physical systems with 
emergent collective computational abilities," Proc Natl 
Acad Sci USA 79, 2554-8 (Apr 1982). 

[7] J J Hopfield and D W Tank, "Computing with neural 
circuits: a model," Science 233, 625-33 (Aug 1986). 

[8] Daniel J. Amit, Modeling Brain Function: The World of 
Attractor Neural Networks (Cambridge University Press, 
Cambridge, 1989). 

[9] John Hertz, Anders Krogh, and Richard G. Palmer, In- 
troduction to the theory of neural computation (Addison- 
Wesley, 1991). 

[10] Ronen Segev, Joe Goodhouse, Jason Puchalla, and 
Michael J Berry, "Recording spikes from a large fraction 
of the ganglion cells in a retinal patch," [Nat Neurosci| 7, 
1154-61 (Oct 2004). 
[11] A M Litke, N Bezayiff, E J Chichilnisky, W Cunning- 
ham, W Dabrowski, A A Grillo, M Grivich, P Grybos, 
P Hottowy, S Kachiguine, R S Kalmar, K Mathieson, 
D Petrusca, M Rahman, and A Sher, "What does the 
eye tell the brain?: Development of a system for the 
large-scale recording of retinal output activity," Nuclear 
S cience, IEEE Transactions on 51, 1434 - 1440 (2004). 
[12] Joshua A Weinstein, Ning Jiang, Richard A White, 
S Fisher, and Stephen R Quake, "High- 



[13] 



Joshua 
Daniel 

throughput sequencing of the zebrafish antibody reper- 
toire," Science 324, 807-10 (May 2009). 
Michele Ballerini, Nicola Cabibbo, Raphael Candelier, 
Andrea Cavagna, Evaristo Cisbani, Irene Giardina, Al- 
berto Orlandi, Giorgio Parisi, Andrea Procaccini, Massi- 
miliano Viale, and Vladimir Zdravkovic, "Empirical in- 
vestigation of starling flocks: a benchmark study in col- 
lective animal behaviour," Anim Behavl76, 201-215 (Jan 



[14] 



[15] 



[16] 



2008). 

Andrea Cavagna, Irene Giardina, Alberto Orlandi, Gior- 
gio Parisi, and Andrea Procaccini, "The STARFLAG 
handbook on collective animal behaviour: 2. Three- 
dimensional analysis," (Jan 2008). 

Andrea Cavagna, Irene Giardina, Alberto Orlandi, Gior- 
gio Parisi, Andrea Procaccini, Massimiliano Viale, and 
Vladimir Zdravkovic, "The STARFLAG handbook on 
collective animal behaviour: 1. Empirical methods," (Jan 
2008). 

Andrea Cavagna, Alessio Cimarelli, Irene Giardina, Al- 
berto Orlandi, Giorgio Parisi, Andrea Procaccini, Raf- 
faele Santagati, and Fabio Stefanini, "New statistical 
tools for analyzing the structure of animal groups," Math 
Biosci 214, 32-37 (Jan 2008). 

M Ballerini, N Cabibbo, R Candelier, A Cavagna, E Cis- 
bani, I Giardina, V Lecomte, A Orlandi, G Parisi, A Pro- 
caccini, M Viale, and V Zdravkovic, "Interaction ruling 
animal collective behavior depends on topological rather 
than metric distance: evidence from a field study," Proc 
Natl Acad Sci USA 105, 1232-7 (Jan 2008). 



[19 

[20 
[21 



[22 
[23 



[18] D Bishop and J Reppy, "Study of the superfluid transi- 
tion in two-dimensional *He films," ,Phys Rev Lett, 40, 



[24 
[25 

[26 

[27 
[28; 
[29 
[30 

[31 
[32 

[33 
[34 
[35 

[37 



1727-1730 (Jun 1978). 

Elad Schneidman, Michael J Berry, Ronen Segev, 
and William Bialek, "Weak pairwise correlations imply 
strongly correlated network states in a neural popula- 
tion," Nature 440, 1007-12 (Apr 2006). 
Gasper Tkacik, Elad Schneidman, Michael J Berry II, 
and William Bialek, "Spin glass models for a network of 
real neurons," arXiv 0912.5409vl (Jan 2009). 
Aonan Tang, David Jackson, Jon Hobbs, Wei Chen, 
Jodi L Smith, Hema Patel, Anita Prieto, Dumitru Petr- 
usca, Matthew I Grivich, Alexander Sher, Pawel Hot- 
towy, Wladyslaw Dabrowski, Alan M Litke, and John M 
Beggs, "A maximum entropy model applied to spa- 
tial and temporal correlations from cortical networks in 
vitro," J Neurosci 28, 505-18 (Jan 2008). 
M E J Newman, "Power laws, pareto distributions and 
zipf's law," Contemporary Physics 46, 323 (Sep 2005). 
Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J New- 
man, "Power-law distributions in empirical data," Siam 
Rfiv 51, 661-703 (Jan 2009). 

George Kingsley Zipf, Human behavior and the principle 
of least effort (Addison- Wesley, Cambridge, 1949). 
F. Auerbach, "Das Gesetz der 

Bevolkerungskonzentration," Petermanns Geographische 
Mitteilungen 59, 74-76 (1913). 

Greg J Stephens, Thierry Mora, Gasper Tkacik, and 
William Bialek, "Thermodynamics of natural images," 
arXiv 806.2694yl, (Jun 2008). 

E. T Jaynes, "Information theory and statistical mechan- 
ics," Physical Review 106, 620 (May 1957). 
E. T Jaynes, "Information theory and statistical mechan- 
ics, ii," Physical Review 108, 171 (Oct 1957). 
T. M. Cover and J. A. Thomas, Elements of information 
theory (Wiley, New- York, 1991). 

Joseph B Keller and Bruno Zumino, "Determination of 
intermolecular potentials from thermodynamic data and 
the law of corresponding states," |J Chem Phy s 30, 1351 
(Aug 1959). 

D Ackley, G Hinton, and T Sejnowski, "A learning al- 
gorithm for boltzmann machines," Cognitive science 9, 
147-169 (Jan 1985). 

Jonathon Shlens, Greg D Field, Jeffrey L Gauthier, Mar- 
tin Greschner, Alexander Sher, Alan M Litke, and E J 
Chichilnisky, "The structure of large-scale synchronized 
firing in primate retina," J Neurosci 29, 5022-31 (Apr 
2009). 

Thierry Mora, Aleksandra M Walczak, William Bialek, 
and Curtis G Callan, "Maximum entropy models for an- 
tibody diversity," Pro c Natl Acad Sci USA| 107, 5405-10 
(Mar 2010). 

Tamara Broderick, Miroslav Dudik, Gasper Tkacik, 
Robert E Schapire, and William Bialek, "Faster solutions 
of the inverse pairwise ising problem," arXiv [0712.2437v2l 
(Jan 2007). 

Vitor Sessak and Remi Monasson, "Small-correlation ex- 
pansions for the inverse ising problem," |J Phys A-Math| 
Theor 42, 055001 (Jan 2009). 

"S^Cocco, S Leibler, and R Monasson, "Neuronal couplings 
between retinal ganglion cells inferred by efficient inverse 
statistical physics methods," Proc Natl Acad Sci USA| 
106, 14058-62 (Jul 2009). 

Marc Mezard and Thierry Mora, "Constraint satisfac- 
tion problems and neural networks: A statistical physics 
perspective," J Physiol Paris, 103, 107-13 (Jan 2009). 



20 



[38] Martin Weigt, Robert A White, Hendrik Szurmant, 
James A Hoch, and Terence Hwa, "Identification of direct 
residue contacts in protein-protein interaction by mes- 
sage passing," |Proc Natl Acad Sci USA 106, 67-72 (Jan 
2009). 

[39] F. Rieke, D. Warland, R. de Ryuter van Stevenick, and 
W. Bialek, Spikes: Exploring the Neural Code (MIT 
Press, Cambridge, 1997). 

[40] M Meister, L Lagnado, and D A Baylor, "Concerted sig- 
naling by retinal ganglion cells," Science 270, 1207-10 
(Nov 1995). 

[41] Elad Schneidman, Susanna Still, Michael J Berry, and 
William Bialek, "Network information and connected 
correlations," Phys Rev Lett 91, 238701 (Dec 2003). 

[42] Jonathon Shlens, Greg D Field, Jeffrey L Gauthier, 
Matthew I Grivich, Dumitru Petrusca, Alexander Sher, 
Alan M Litke, and E J Chichilnisky, "The structure of 
multi-neuron firing patterns in primate retina," J Neu- 
I rosci 26, 8254-66 (Aug 2006). 

[43] H Wassle, L Peichl, and B B Boycott, "Mosaics and ter- 
ritories of cat retinal ganglion cells," Prog Brain Res 58, 
183-90 (Jan 1983). 

[44] Shan Yu, Debin Huang, Wolf Singer, and Danko Nikolic, 
"A small world of neuronal synchrony," Cereb Cortex 18, 
2891-901 (Dec 2008). 

[45] Ifije E Ohiorhenuan and Jonathan D Victor, 
"Information-geometric measure of 3-neuron firing 
patterns characterizes scale-dependence in cortical 
networks," Journal of computationa l neuroscience| 
published online ahead of print (Jul 2010). 

[46] Ifije E Ohiorhenuan, Ferenc Mechler, Keith P Purpura, 
Anita M Schmid, Qin Hu, and Jonathan D Victor, 
"Sparse coding and high-order correlations in fine-scale 
cortical networks," Nature 466, 617-21 (Jul 2010). 

[47] Gasper Tkacik, Elad Schneidman, Michael J Berry II, 
and William Bialek, "Ising models for networks of real 
neurons," arXiv q-bio/0611072vl (Nov 2006). 

[48] M. Mezard, G. Parisi, and M. A. Virasoro, Spin-Glass 
Theory and Beyond, Lecture Notes in Physics, Vol. 9 
(World Scientific, Singapore, 1987). 

[49] C B Anfinsen, "Principles that govern the folding of pro- 
tein chains," Science 181, 223-30 (Jul 1973). 

[50] M H Cordes, A R Davidson, and R T Saner, "Sequence 
space, folding and protein design," Curr Opin Struct Biol 
6, 3-10 (Feb 1996). 

[51] Valerie Daggett and Alan Fersht, "The present view of 
the mechanism of protein folding," Nat Rev Mol Cell Biol, 
4, 497-502 (Jun 2003). 

[52] Richard Durbin, Sean R. Eddy, Anders Krogh, and 
Graeme Mitchison, Biological Sequence Analysis: Proba- 
bilistic Models of Proteins and Nucleic Acids (Cambridge 
University Press, Cambridge, 2005). 

[53] A Horovitz and A R Fersht, "Co-operative interactions 
during protein folding," J Mol Biol 224, 733-40 (Apr 
1992). 

[54] S W Lockless and R Ranganathan, "Evolutionarily con- 
served pathways of energetic connectivity in protein fam- 
ilies," Science 286, 295-9 (Oct 1999). 

[55] Michael Socolich, Steve W Lockless, William P Russ, 
Heather Lee, Kevin H Gardner, and Rama Ranganathan, 
"Evolutionary information for specifying a protein fold," 
[Nature 437, 512-8 (Sep 2005). 

[56] William P Russ, Drew M Lowery, Prashant Mishra, 
Michael B Yaffe, and Rama Ranganathan, "Natural-like 



function in artificial ww domains," ' Nature] 437, 579-83 
(Sep 2005). 

[57] William Bialek and Rama Ranganathan, "Rediscovering 
the power of pairwise interactions," arXiv [07l2.4397vl| 
(Jan 2007). 

[58] Najeeb Halabi, Olivier Rivoire, Stanislas Leibler, and 
Rama Ranganathan, "Protein sectors: evolutionary units 
of three-dimensional structure," jCell 138, 774-86 (Aug 
2009). 

[59] Bryan Lunt, Hendrik Szurmant, Andrea Procaccini, 
James A Hoch, Terence Hwa, and Martin Weigt, "In- 
ference of direct residue contacts in two-component sig- 
naling," Meth Enzymol 471, 17-41 (Jan 2010). 

[60] Alexander Schug, Martin Weigt, Jose N Onuchic, Terence 
Hwa, and Hendrik Szurmant, "High-resolution protein 
complexes from integrating genomic information with 
molecular simulation," |Proc Natl Acad Sci USA 106, 
22124-9 (Dec 2009). 

[61] Kenneth P. Murphy, Paul Travers, Charles Janeway, and 
Mark Walport, Janeway's immunobiology (Garland, New 
York, 2008). 

[62] Dr Jens Krause and Graeme D. Ruxton, Living in groups 
(Oxford University Press, Oxford, 2002). 

[63] Irene Giardina, "Collective behavior in animal groups; 
theoretical models and empirical studies," |HFSP J| 2, 
205-19 (Aug 2008). 

[64] Andrea Cavagna, Alessio Cimarelli, Irene Giardina, Gior- 
gio Parisi, Raffaele Santagati, Fabio Stefanini, and Massi- 
miliano Viale, "Scale-free correlations in starling flocks," 
Proc Natl Acad Sci USA 107, 11865-70 (Jun 2010). 

[65] J M CuUen, E Shaw, and H A Baldwin, "Methods 
for measuring the three-dimensional structure of fish 
schools," Anim Behav 13, 534-43 (Oct 1965). 

[66] Kerson Huang, Statistical Mechanics, 2Nd Ed (Wiley, 
New York, 1987). 

[67] T Vicsek, A Czirok, E Ben- Jacob, I Cohen, and 
O Shochet, "Novel type of phase transition in a system of 
self-driven particles," Phys Rev Lett 75, 1226-1229 (Aug 
1995). 

[68] J Toner and Y Tu, "Long-range order in a two- 
dimensional dynamical xy model: How birds fiy to- 
gether," Phys Rev Lett 75, 4326-4329 (Dec 1995). 

[69] J Toner and YH Tu, "Flocks, herds, and schools: A quan- 
titative theory of flocking," Phys Rev E 58, 4828-4858 
(Jan 1998). 

[70] A Corral, C Perez, A Diaz-Guilera, and A Arenas, "Self- 
organized criticality and synchronization in a lattice 
model of integrate-and-fire oscillators," Phys Rev Lett 
74, 118-121 (Jan 1995). 

[71] A Herz and J Hopfield, "Earthquake cycles and neural 
reverberations: Collective oscillations in systems with 
pulse-coupled threshold elements," Phys Rev Lett 75, 
1222-1225 (Aug 1995). 

[72] Dan-Mei Chen, S Wu, A Guo, and Z Yang, "Self- 
organized criticality in a cellular automaton model of 
pulse-coupled integrate-and-fire neurons," Journal of| 
^ Physics A: Mathematical and General 28, 5177 (Sep 

[73] Per Bak and Chao Tang, "Earthquakes as a self- 
organized critical phenomenon," J Geophys Res 94, 
15635-15,637 (1989). 

[74] John M Beggs and Dietmar Plenz, "Neuronal avalanches 
in neocortical circuits," J Neurosci 23, 11167-77 (Dec 



21 



2003). 

[75] John M Beggs, "The criticahty hypothesis: how local cor- 
tical networks might optimize information processing," 
"Philo s Transact A Math Phys Eng Sci, 366, 329-43 (Feb 
2008). 

[76] Theodore E. Harris, The theory of branching processes 
(Springer, Berlin, 1949). 

[77] S Zapperi, Bffikgaard Lauritsen K, and H Stanley, "Self- 
organized branching processes: Mean- field theory for 
avalanches," Phys Rev Lett 75, 4071-4074 (Nov 1995). 

[78] Wei Chen, Jon P Hobbs, Aonan Tang, and John M Beggs, 
"A few strong connections: optimizing information reten- 
tion in neuronal avalanches," BMC Neurosci| ll, 3 (Jan 
2010). 

[79] Clayton Haldeman and John M Beggs, "Critical branch- 
ing captures activity in living neural networks and max- 
imizes the number of metastable states," Phys Rev Lett 
94, 058101 (Feb 2005). 

[80] John M Beggs and Dietmar Plenz, "Neuronal avalanches 
are diverse and precise activity patterns that are stable 
for many hours in cortical slice cultures," ,J Neurosci, 24, 
5216-29 (Jun 2004). 

[81] T. Gold, "Hearing. II. the physical basis of the action of 
the cochlea," Proc R Soc Lond B Biol Sci 135, 492498 
(1948). 

[82] D T Kemp, "Stimulated acoustic emissions from within 
the human auditory system," J Acoust Soc Am 64, 1386- 
91 (Nov 1978). 

[83] PM Zurek, "Spontaneous narrowband acoustic signals 
emitted by human ears," J Acoust Soc Am 69, 514-523 
(1981). 



[84] A J Hudspeth, Frank Jiilicher, and Pascal Martin, "A cri- 
tique of the critical co chlea: Hopf-a bifurcation-is better 
than none," || TNeurop hysiol 104, 1219-29 (Sep 2010). 

[85] M Ospeck, V M Egufluz, and M O Magnasco, "Evidence 
of a hopf bifurcation in frog hair cells," Biophys J 80, 
2597-607 (Jun 2001). 

[86] Y Choe, M O Magnasco, and A J Hudspeth, "A model for 
amplification of hair-bundle motion by cyclical binding of 
Ca^""" to mechanoelectrical-transduction channels," Proc 
Natl Acad Sci USA 95, 15321-6 (Dec 1998). 

[87] V M Egm'luz, M Ospeck, Y Choe, A J Hudspeth, and 
M O Magnasco, "Essential nonlinearities in hearing," 
Phys Rev Lett 84, 5232-5 (May 2000). 

[88] S Camalet, T Duke, F Jiilicher, and J Prost, "Auditory 
sensitivity provided by self-tuned critical oscillations of 
hair cells," Proc Natl Acad Sci USA 97, 3183-8 (Mar 
2000). 

[89] Marcelo O Magnasco, "A wave traveling over a hopf in- 
stability shapes the cochlear tuning curve," Phys Rev 
Lett 90, 058101 (Feb 2003). 

[90] Thomas Duke and Frank Jiilicher, "Active traveling wave 
in the cochlea," Phys Rev Lett 90, 158101 (Apr 2003). 

[91] MO Magnasco, O Piro, and GA Cecchi, "Self-tuned crit- 
ical anti-hebbian networks," Phys Rev Lett 102, 258102 
(2009). 

[92] G Yule, "A mathematical theory of evolution, based on 
the conclusions of Dr. J. C. Willis, F.R.S," Philosophi- 
cal Transactions of the Royal Society of London. Series 
BContaining Papers of a Biological Character 213, 21-87 
(Jan 1925). 



